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Quantitative model checkers for Markov Decision Processes typically use finite-precision arithmetic. 
If all the coefficients in the process are rational numbers, then the model checking results are rational, 
and so they can be computed exactly. However, exact techniques are generally too expensive or 
limited in scalability. In this paper we propose a method for obtaining exact results starting from an 
approximated solution in finite-precision arithmetic. The input of the method is a description of a 
scheduler, which can be obtained by a model checker using finite precision. Given a scheduler, we 
show how to obtain a corresponding basis in a linear-programming problem, in such a way that the 
basis is optimal whenever the scheduler attains the worst-case probability. This correspondence is 
already known for discounted MDPs, we show how to apply it in the undiscounted case provided 
that some preprocessing is done. Using the coiTespondence, the linear-programming problem can be 
solved in exact arithmetic starting from the basis obtained. As a consequence, the method finds the 
worst-case probability even if the scheduler provided by the model checker was not optimal. In our 
experiments, the calculation of exact solutions from a candidate scheduler is significantly faster than 
the calculation using the simplex method under exact arithmetic starting from a default basis. 

1 Introduction 

Model checking of Markov Decision Processes (Mdps) has been proven to be a useful tool to verify and 
evaluate systems with both probabilistic and non-deterministic choices. Given a model of the system un- 
der consideration and a qualitative property concerning probabilities, such as "the system fails to deliver 
a message with probability at most 0.05", a model checker deduces whether the property holds or not for 
the model. As different resolutions of the non-deterministic choices lead to different probability values, 
verification techniques for Mdps rely on the concept of schedulers (also called policies, or adversaries), 
which are defined as functions choosing an option for each of the paths of an Mdp. Model-checking 
algorithms for Mdps proceed by reducing the model-checking problem to that of finding the maximum 
(or minimum) probability to reach a set of states under all schedulers H . 

Different techniques for calculating these extremal probabilities exist: for an up-to-date tutorial, 
see [91. Some of them (for instance, value iteration) are approximate in nature. If all the coefficients 
in the process are rational numbers, then the model checking results are rational, and so they can be 
computed exactly. However, exact techniques are generally too expensive or limited in scalability. Linear 
programming (LP) can be used to obtain exact solutions, but in order to achieve reasonable efficiency it 
is often carried out using finite-precision, and so the results are always approximations. (We performed 
some experiments showing how costly it is to compute exact probabilities using LP without our method.) 
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In addition, the native operators in programming languages like Java have finite precision: the extension 
to exact arithmetic involves significant reworking of the existing code. 

We propose a method for computing exact solutions. Given any approximative algorithm being able 
to provide a description of a scheduler, our method shows how to extend the algorithm in order to get 
exact solutions. The method exploits the well-known correspondence between model-checking problems 
and linear programming problems H, which allows to compute worst-case probabilities by computing 
optimal solutions for LP problems. We do not know of any similar approach to get exact values. This 
might be due to the fact that, from a purely theoretical point of view, the problem is not very interesting, as 
exact methods exists and they are theoretically efficient: the problem is that, in practice, exact arithmetic 
introduces significant overhead. 

The simplex algorithm [5] for linear programming works by iterating over different bases, which are 
submatrices of the matrix associated to the LP problem. Each basis defines a solution, that is, a valuation 
on the variables of the problem. The simplex method stops when the basis yields a solution with certain 
properties, more precisely, a so-called feasible and dual feasible solution. By algebraic properties, such 
a solution is guaranteed to be optimal. 

The core of our method is the interpretation of the scheduler as a basis for the linear programming 
problem. Given a scheduler complying with certain natural conditions, a basis corresponding to the 
scheduler can be used as a starting point for the simplex algorithm. We show that, if the scheduler is 
optimal, then the solution associated to the corresponding basis is feasible and dual feasible, and so a 
simplex solver provided with this basis needs only to check dual feasibility and compute the solution 
corresponding to the basis. As our experiments show, these computations can be done in exact arith- 
metic without a huge impact in the overall model-checking time. In fact, using the dual variant of the 
simplex method, the time to obtain the exact solution is less than the time spent by value iteration. If the 
scheduler is not optimal, the solver starts the iterations from the basis. This is useful for two reasons: 
we can let the simplex solver finish in order to get the exact solution; or, once we know that we are not 
getting the optimal solution, we can perform some tuning in the model checker as, for instance, reduce 
the convergence threshold (we also show a case in which the optimal scheduler cannot be found with 
thresholds within the 64-bit IEEE 754 floating point precision). 

The correspondence between schedulers and bases is already known for discounted Mdps (see, for 
instance 13). We show the correspondence for the undiscounted case in case some states of the system are 
eliminated in preprocessing steps. The preprocessing steps we consider are usual in model checking ||9|: 
given a set of target states, one of the preprocessing algorithms removes the states that cannot reach the 
target, while the other one removes the states that can avoid reaching the target. These are qualitative 
algorithms based on graphs that do not perform any arithmetical operations. 

The next section introduces the preliminary concepts we need along the paper. Section[3]presents our 
method and the proof of correctness. The experiments are shown in Section]?] The last section discusses 
related results concerning complexity and policy iteration. 

2 Preliminaries 

We introduce the definitions and known-results used throughout the paper, concerning both Markov 
decision process and linear programming. 
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2.1 Markov decision processes 

Definition 1. Let Dist(A) denote the set of discrete probability distributions over the set A. A Markov 
Decision Process (Mdp) M is a pair (S,T) where S is a finite set of states and T C S x Dist(S) is a set 
of fra?i5/f/o?i5 Given n = {s,d) G T, the value d{t) is the probability of making a transition to t from 
s using IX. We write instead of d{t), and write state(/x) for s. We define the set en{s) as the set of 
all transitions jj. with state(/x) = s. For simplicity, we make the usual assumption that every state has at 
least one enabled transition: en{s) 7^ for all s E S. 

We write 5 A- f to denote jx G en{s) A > 0. A path in an Mdp is a (possibly infinite) sequence 
p = s^.^Ks^.- ■ ■ ..}X".s'\ where ^' € en(5''"') and ix'{s^) > for all /. If p is finite, the last state of p is 
denoted by last(p), and the length is denoted by len(p) (a path having a single state has length 0). Given 
a set of states U, we define reach(?7) to be the set of all infinite paths p = s'^\|J.^ .s^. • • • such that s' £U 
for some /. 

The semantics of Mdps is given by schedulers. A scheduler 77 for an Mdp M is a function 77 : S — T 
such that ri{s) £ en{s) for all s. In words, the scheduler chooses an enabled transition based on the 
current state. For all schedulers T], f E S, the set Paths(f, T]) contains all the paths /'./x^.^'. • • • ./x''.^" such 

that s^ = t, jJ.' = ri{s'^^) and s'^^ — > s' for all /. The reader familiar with Mdps might note that we are 
restricting to Markovian non-randomized schedulers (that is, they map states to transitions, instead of 
the more general schedulers mapping paths to distributions on transitions). As explained later on, these 
schedulers suffice for our purposes. 

The probabihty Ptj^ (p) of the path p under rj starting from t is nl™i''' l^'i^') if P G Paths(f, T]). If 
p Paths (f, 77), then the probability is 0. We often omit the subindices M and/or t if they are clear from 
the context. 

We are interested on the probability of (sets of) infinite paths. Given a finite path p, the probability 
of the set p^ comprising all the infinite paths that have p as a prefix is defined by Pr^(p^) = Pr''(p). 
In the usual way (that is, by resorting to the Caratheodory extension theorem) it can be shown that the 
definition on the sets of the form p^ can be extended to a-algebra generated by the sets p^. 

The verification of PCTL* |4] and G)-regular formulae [6| (for example Ltl) can be reduced to the 
problem of calculating maXi,,j Pr^7 (reach(?7)) (or mini^,j Pr^? (reach(?7))) for Mdps M', states s and sets 
U obtained from the formula. 

In consequence, in the rest of the paper we concentrate on the following problems. 

Definition 2. Given an Mdp M, an initial state s and set of target states U, a reachability problem 
consists of computing max^ Pr^'' (reach(?7)) (or minj, Pr^'' (reach(?7))). 

From classic results in Mdp theory (for these results applied to model checking see, for instance, |[T] 
Chapter 3]) there exists a scheduler t]* such that 



for all 5 G S. That is, 77* attains the maximum probability for all states. 

An analogous result holds for the the case of minimum probabilities. There exists 77 * such that 



77* = arg max Pr^'' (reach (?7)) 



(1) 



77* = argminPr^"^ (reach (?/)) 



(2) 



Defining transitions as pairs helps to deal with the case in which the same distribution is enabled in several states 
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for all s £S. 

Even in a more general setting allowing for non-Markovian and randomized schedulers, it can be 
proven that we can assume T]* to be Markovian and non-randomized. The existence of T]* justifies our 
restriction to Markovian and non-randomized schedulers. 

Markov chains A Markov chain (Mc) is an Mdp such that \en{s)\ = 1 for all s £ S. Note that a 
Markov chain has exactly one scheduler, namely the one that chooses the only transition enabled in each 
state. Hence, for Markov chains, we often disregard the scheduler and denote the probability of reaching 
U asPr^(reach([/)). 

Definition 3. Given an Mdp M = (S, T) and a scheduler rj, we define the Markov chain M J, t] = (S, T') 
where /i e T' iff T](state(/i)) = /i. 

A simple application of the definitions yields 

Pr^i" (reach([/)) = Pr^,^^ (reach([/)) . (3) 

2.2 Linear programming 

We use a particular canonical form of hnear programs suitable for our needs. It is based on ||5] Appendix 
B], which is also a good reference for all the concepts and results given in this subsection. 
A linear programming problem consists in computing 

min{cx | Ax = b A x > 0} , (4) 

X 

given a constraint matrix A, a constraint vector b and a cost vector c. In the following, we assume that 
A has m rows and m + n columns, for some m > and « > 0. Hence, c is a row vector with m + n 
components, and b is a column vector with m components. 

A solution is any vector x of size m + n. The i-th component of x is denoted by x,. We say that 
that a solution is feasible if Ax = b and x > 0; it is optimal if is feasible and cx is minimum over all 
feasible x. A problem is feasible if it has a feasible solution, and bounded if it has an optimal solution. 
A non-singular mxm submatrix of A is called a basis. We overload the letter B to denote both the basis 
and the set of indices of the corresponding columns in A. A variable x^ is basic ifk^B. Note that, given 
our assumptions on the dimension of the constraint matrix, for all bases there are m basic variables and 
n non-basic variables. Given a basis B, and any vector t, let be the subvector of t having only the 
components in B. When B is clear from the context, we use N to denote the set of columns not in B, and 
use t'^ accordingly. For a matrix A, let A^ be submatrix of A having only the columns that are not in B. 
The solution x induced by the basis B is defined as x^- = for all k ^ B, while the values for ^ G B are 
given by the vectorial equation x^ = B^h. A solution x is basic if there is a basis that induces x. Given 
B and k £ N, the reduced cost of a variable x^ is defined as Ck — c^B^^A^, where A^ is the k-th column 
of A. A solution is dual feasible if it correspond to a basis such that > for all k £ N. 

In our proofs we make use of the following lemma, which is particular to our canonical form. 

Lemma 1. 

Ax = b ifx is basic . 

Proof. By splitting A into basic and non-basic columns we get Ax = Bx^ +A^x^ = BB^h +A^0 = 
Ih = h . (Note that x might not be feasible as it could be x ^ 0.) □ 
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Correctness of the simplex method reUes on the following well-known facts about LP problems: 

• Every solution that is both feasible and dual feasible is optimal 

• If there exists an optimal solution, then there exists a basic solution that is feasible and dual feasible 
(and hence optimal) 

As the problems we deal with are ensured to be bounded and feasible, we assume that there exists 
an optimal solution. In this context, the simplex algorithm explores different bases until it finds a basis 
whose corresponding solution is feasible and dual feasible. 

In several implementations of the algorithm the starting basis can be specified (when it is not, a de- 
fault one is used). The initial basis does not need to be feasible nor dual feasible. In case the starting 
basis complies with both feasibilities, the simplex algorithm finishes after checking that these feasibil- 
ities are met, without any further exploration. In Subsection 2.3 we show how reachability problems 
correspond to LP problems. In Section[3]we show that, under a certain assumption on the model checker 
(Assumption[T]l, a basis can be obtained from the scheduler provided by the model checker. In particular, 
optimal schedulers yield feasible bases (Theorem[3]). Under our assumption, all the bases obtained from 
schedulers are dual feasible (Theorem |4]l. 

Among the different variants of the simplex method, in our experiments (Section |4]l we use the dual 
simplex, which first looks for a dual-feasible basis (in the so-called first phase) and next tries to find a 
feasible one while keeping dual feasibility (in the second phase). This is appropriate in our case since, 
under our assumptions, the first phase is not needed (as formalized in Theorem |4]). In contrast to the 
dual simplex, the primal simplex (or, simply, simplex) looks for a feasible basis in the first phase. As 
a consequence, if iterations are required (according to our results in Section [3} this is case in which the 
model checker fails to provide the optimal scheduler), then the primal simplex performs both phases. 
However, both variants can be used and, as our experiments show, the starting basis obtained from the 
scheduler is useful to save iterations. In the few cases in which PRISM did not provide the optimal 
schedulers, the dual simplex required less iterations than the primal one; both of them perform far better 
when starting from a basis corresponding to a near-optimal scheduler than when starting from the default 
basis (see Section |4]l. 



2.3 Linear programming for Markov decision processes 

Linear programming can be used to compute optimal probabilities for some of the states in the system. 
The set of states whose maximum (minimum, resp.) probability is is first calculated using graph-based 
techniques |9, Sec. 4.1]. This qualitative calculation is often considered as a preprocessing step before the 
proper quantitative model checking. Given a set of target states U, let S""^"" be the set of states S such that 
max,^ Pr^*^ (reach (?7)) = 0. Similarly, let S™"'' be the set of states such that min,j Pr^'' (reach ([/)) = 0. 
When focusing on maximum probabilities, we write the set S \ (S™^"*' U ?7) as S ■ (called the set of maybe 
states), while for minimum probabilities S" is S \ (S™"° UU). 

The maximum probabilities for s G S^^^o are by definition of S™^"*'. For s £ U the probabilities 
are 1, since when starting from a state in U, the set U is reached in the initial state, regardless of the 
scheduler The minimum probabilities for ^S""'"" are by definition of S™"*^, and the probabilities for 
s €U are again 1. Next we show how to obtain the probabilities for the states in S', thus covering all the 
states in the system. 
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In order to avoid order issues, we assume that the states are S ■ = 5i , • • • , 5„ and the transitions are: 

J = Hi,---,H,„ (5) 

in such a way that if = state(/Xj), Sf = state(/Xy) and / < /', then j < f (from Def.[Tj recall that state(/X;) 
is the state in which /x,- is enabled). Roughly speaking, the transitions are ordered with respect to the states 
in which they are enabled. From now on, we use this orderings consistently throughout the paper. 

In the following theorem, the matrix A\I associated to a reachability problem maxPr^'' (reach(?7)) is 
amx (n + m) matrix whose last m columns form the identity matrix. We define of Ajj for the column 
j <n as: Aij = IJ.i{sj) if sj ^ state(/i,), or A,j- = \li(sj) — 1 if Sj = state(/x,). The vector b is defined as 

Theorem 1. For all states st G S', the value maxjj Pr^'''(reach(?7)) is the value of the variable Xi in an 
optimal solution of the following LP problem: 

n m 

min (I,--- ,1,0,--- ,0)x 
(A I /'«x'«)x = b 
x>0. 

Analogously, the value min,| Pi'm'^ (reach(?7)) is the value of the variable x,- in an optimal solution of 
the following LP problem. 

n m 

min -(l~^^,6~^^)x 

(-A I /'»x'«)x = -b ^ 
x>0. 

(Note that, in the constraint, the matrix A is negated, while I is not.) 

This theorem is just the well-known correspondence between reachability problems and LP prob- 
lems lfTTIl .ll9l Section 4.2], written in our LP setting. 

The variables that multiply the columns in the identity matrix are called slack variables in the LP 
literature. They are also the variables x^ in the following notation. 

Notation 1. From now on, we identify each column 1 < j < « of{A\I) with the state sj, and each column 
n < j < n + m with the transition Each row i is identified with /I,-. In consequence, we write Af^^^for 
the elements of the matrix, and x^ or x^ for the components of the solution x. 



3 A method for exact solutions 

Our method serves as a complement to a model checker being able to: 

• calculate the set S - , and 

• give a description of a scheduler, that the model checker considers optimal based on finite precision 
calculations 

We only require a weak "optimality" condition on the scheduler returned by the model checker, which we 
refer to as apt: we say that a scheduler 77 is apt iff Pr^'' (reach(?7)) > for all s G S - . In order words, we 
only require the scheduler to reach U for all states that can reach it (no matter with which probabihty). In 
the case of minimum probabilities, every scheduler is apt, since if we have Pr^'^ (reach(?7)) = for some 
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TJ, then s^S- (by definition of S™""). For the case of the maximum, the existence of an apt scheduler 
follows from the definition of S', the scheduler rj* in (ml being a suitable witness. 



Assumption 1. We assume that the model checker is able to provide an apt scheduler, in the sense that 
our method is not guaranteed to return a value in case the scheduler is not apt. 

Our method is described in the Algorithm [T] The function construct_problemconstructs the LP 



input : An Mdp M and a set of states U 

output: X such that = max^ Pr^''(reach(?7)) (min^ Pr^''(reach(?7)), resp.) for all G 

1 // Use model checker to get the set S' and a scheduler 

2 (S-,T7) <r- reach_analysis(M, U); 
^ *r- construct _problem(M, S''); 

4 B,j <— construct_basis(^, T]); 

5 start_simplex_solver(^, Bfj) ; 

6 if the exact simplex solver finishes in one iteration then 

7 I return arg minx obtained from the solver; 

8 else if the solver performs several iterations then // Tj is not optimal 



return argminx^, obtained from the solver once it finishes; 

// Or interrupt the solver and change the model checker parameters 



9 
10 

11 else if the solver reports that the basis is singular then 

12 // For the minimum, this case cannot happen 

13 error T] is not apt; 

14 end 

Algorithm 1: Method to get exact solutions 



problems (j6]l and (|7]). Given 77, the basis Brj obtained by construct _basisis defined as 

sGBrj, for all sgS'^ Xf^GB^ <;=^ rj (state(/i)) ^ jj. . (8) 

Roughly speaking, the basis contains all states, and all the transitions that are not chosen by rj. Some- 
times (particularly in the proof of Theorem |4jl we write Bm',?) to make it clear that the basis belongs to 
an Mdp M'. 

The rest of this section is devoted to prove the correctness of the algorithm, in the sense made precise 
by the following theorem (which is proven later). 

Theorem 2. If the algorithm returns a value, then the value corresponds to the output specification. 
Moreover, if the scheduler r\ provided by the model checker is apt, then the matrix defined by Q is a 
basis, and the algorithm returns optimum values from the LP solver If the scheduler provided by the 
model checker is optimum as in ([T]l, then the basis in (|8]l is both feasible and dual feasible. 

Recall from Subsection |2.2| that the simplex algorithm stops as soon as it finds a solution that is 
feasible and dual feasible. Hence, the fact that an optimal scheduler yields a basic, feasible and dual 
feasible solution causes the simplex solver to stop as soon as the feasibility checks are finished. 

The rest of this section is devoted to prove Theorem |2] In our proofs we resort to the following 
definitions and lemmata. The first definition uses indices as explained in Notation [T] 



24 



Efficient computation of exact solutions for quantitative model checking 



Definition 4. Given a scheduler rj, we write the set of transitions complying with T] (state (/x)) = jJ. as 
Trj = • • • and we assume that this ordering respects the ordering in pi. We define to be 
the nxn matrix whose elements are as C^j = IJ.'{sj). Consider the matrix A in (rob. We define (Air/) to 
be the nxn submatrix of A comprising all the rows G T,j and the columns s for all s G S ■ . 

Lemma 2. The transitions /i' G Tjj comply with state(/x') = st for all st G S ■. In consequence, rj (si) = /i'. 

Proof. Note that since the order in T,j respects the order in (jsjl, we have that the sequence state(/i '),••• , 
state(/x") is a sequence of states Sj^ , • • • with ji < • • • < jn- Since there are n states, and for each state 
s we have exactly one transition /i such that ri{s) = jj., it must be sj^ = ^i, • • • ,Sj^ = s„. This implies 
state(/x') = Sj. = Si as desired. Using this equality and jJ.' G T,j we have ri{si) = T](state(/i')) = jj.'. □ 

Lemma 3. For all rj, we have (A | T] ) = — /. 

Proof. By definition of (A|t]) and the definition of the matrix A in Q we have (A|T]), y = A^, = 
l^'i^j) ~ Qij^ where Qjj = 1 if state(/x') = Sj, or otherwise Qi j = 0. By Lemma[2j we have state(/i') = sj 
iff / = 7. Hence Qij is the identity matrix and (A|t]); j = jU'(5y) —lij = Cj'j—Iij, which completes the 
proof. □ 

The matrix (A | T] ) happens to be very important in our proofs. We profit from the fact that it is 
non-singular provided that 77 is apt. 

Lemma 4. For all apt T], the matrix (A^T]) is non-singular 

Proof. Suppose, towards a contradiction, that there exists x 7^ such that (A|t])x = 0. Then, by 
Lemma [3} we have {C^ — = 0, which implies C^x = x and hence (C'^)^x = x for all z > 0. We 
arrive to a contradiction by showing that for all j there exists z such that 

\{{C'^fx)j\<max\x,,\. (9) 

.v' 

In particular, for q = argmaxy \xsi\ this yields \{{C^Y's)q\ < \xq\, which contradicts (C'^)^x = x. 

Now we prove (j9|l. Since 77 is apt, from every Sj G S- there exists a path p G Paths (i'j, Tj) with 
last(p) G U , such that all the states previous to last(p) are not in U . We prove that z can be taken to be 
len(p). We proceed by induction on the length of p. If len(p) = 1, by Lemma[2]we have r\{sj){u) = 
li^{u) > for some u €U, and henc^^^^s? < 1- Taking z = 1 we obtain 

\{C^x)j\ = I ^ IJ--'{t)x,\ < 2^ iJ.^{t)\xt\ < 2^ /i-'(f)max|;c.v'| < max|;c.v/| , 

teS'^ reS' teS' 

which proves that we can take z = 1 = len(p). The last strict inequality holds only if maxy \xsi\ > 0, 
which follows from x 7^ 0. 



^The result for discounted Mdps does not use as the analogous of this sum is always less than 1 due to the discounts 
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If len(p) = 1 + 1, there exists Sq G S- such that pi^{sq) > and q reaches U inl steps. The inductive 
hypothesis holds for q, and hence | {{C^yx)q\ < maxy \xsi\, from which we obtain: 

;eS'\K} 

= £ +At^-(.,)|((C'')'x),l< I M^'(Omax|x,,| +At^-(.,)|((C'iyx),| 

< Y\ IJ.-' {t)max\xsi\ + {sq)ma.x\xsi\ <ma.x\xs'\ 

This finishes the proof of (j9]). Assuming that (A4,T7)x = for some x 7^ 0, we derived which 
contradicts (C )~x = x for all z > 0, thus finishing the proof. □ 

Lemma 5. For all apt schedulers T], the basis defined in is non-singular. 

Proof. We show that the equation 5^x = holds only if x = 0. Note that the vector x has one component 
for each column of the basis, that is, one component for each state in S- (called x^), and one component 
for each transition such that T](state(/i)) 7^ pL (called x^). The matrix equation B^x = corresponds to 
m equations, one for each transition. \i pL ^B^, since t ^B-q for all t G S', the equation corresponding to 
is 

Y,Af,^tXt+x^=0 . (10) 

If jj. ^ Brj, the corresponding equation is 

£a^,^x, = o. (11) 

(Note that the sum term x^ has disappeared. This corresponds to the fact that the column corresponding 
to x^ is not in the basis.) Since the transitions jj. ^ Bn are those such that 77(state(/x)) = /i, the set of 
equations ( [IT] ) is equivalent to (A4,T])s = 0, where s is the subvector of x having only the components 
corresponding to states. In consequence, if B;jX = holds, then in particular (A|t])s = and, since rj is 
apt, by Lemma|4]it must be s = 0, that is, x, = for all t G S ■ . Using this in ( 10 1 we have x^ = for all 
ji ^Brj. We have proven xy = for every component j of x, thus showing x = 0. □ 

Theorem 3. If a scheduler is optimal as in ^ (or Q, resp.) then the solution induced by the basis By^ 
is feasible. 

Proof. Let x be the solution induced by Brf for some optimal T]. By Lemma [T| we need to prove x > 0. 
We prove this inequality by showing that x, = Pr^*^ (reach(?7)) > for all s and x^ > for all ji. 

Since in Brj the variables x^ G T,j are non basic, in the solution x'' induced by Brj we have x^ = 
for all /X G T,j. Then, using Lemma [T] for our particular constraint matrix A|/, we obtain 

L ^(^)(^)^' + L^(^)(0 ■ (12) 

This is equivalent to (A4,T])x = q for some vector q. By Lemma |4j there exists exactly one x satisfy- 
ing ( 12 1. Let V? be Pr^''(reach(J7)). A classic result for Mdps (see, for instance, [Qj Section 4.2], |[T] 
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Theorem 3.10]) states that, for an optimal scheduler rj, it holds 



and 



v?= max Vm(Ov.^ + Im(0 (13) 



ri{s) G arg max £ n{t)v'^ + £ M(0 



for all states s. From the last two equations: 



v?=Ir7(.)(Ov,'' + £r7(.)(0. 

reS- teu 



This is equivalent to (A|t])v'' = q as before. After (12i we have seen that this equation has a unique 
solution, and so Xs = v? for all s £ S- . By ( pjj ) we have 

> £ Kt)xt + £Ai(0 (14) 
for all 5 G S ■ , /I G en(5). Applying Lemma[l]to our particular constraint matrix A|/, we have 

teS- teu 



Hence, > for all /i by ( 14 1. In conclusion, x^ = v? > for all G S' and > for all jx. Then, the 
solution X induced by is feasible. 

For the case of the minimum, the analogue of ([T3]) is: 



v?= min £M(Ov.'' + lAi(0 (15) 

/^eenCO^gS^ teu 

The fact that the equation {A\,ri)\^ = q has a unique solution again yields x^ = v'^. For x^, using the 
constraint matrix —All for the minimum and (flsll we obtain 



xn = -X, + X At(?K + £ iU(0 = L M(0 + E > . 

teS'^ teu teu teS'' 

□ 

Theorem 4. Given an apt scheduler T], f/je solution induced by the basis Br^ is dual feasible. (For the 
definition of dual feasible see Subsection 2.2 ) 

Proof. First we find a matrix expression for Suppose we reorder the rows of B^ so that the rows 
corresponding to transitions in the basis occur first. The resulting matrix is 

B'^ 

where A' is a submatrix of Brj . We can write 

B'^=PB^ (16) 



A' 


j{m—n) X (m—n) 


(A|T]) 
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where P is a permutation matrix. In order to find the inverse of B'^ we pose the following matrix equation: 



A' 


j{m—n)x{m—n) \ 


/ All 


A12 \ 




(AiT]) 


J 


V A21 


A22 J 








= 


/ A'Aii+A2i 


A'A 12 +A22 








I (AiT7)Aii 


(A|t])Ai2 



jnxn 








j{m—n) X [m—n) 



These equations, suggest that we can take An =0, and hence A21 = 
r\)^^ (which exists by Lemma [s]) and hence A22 = — A'(A|t])^^. 
checked by verifying that B'^^^B'^ = I 



B'^ ^ 



Qnx{m—n) 




j{m—n) X (m—n) 


-A'{Ai^)-' 



I. Moreover, it must be A12 = (AJ, 
The equation below can be easily 



(17) 



Next, we use ( [TT] ) to show that the reduced costs depend only on the constraint coefficients of the 
transitions chosen by the scheduler. 

We consider first the case of the maximum. Recall that our constraint matrix is A|/ and the costs 
associated to the transitions variables are for all /i (see Q). According to the definition of reduced cost 
(see Subsection 2.2 1, to prove dual feasibility we need to show — c^''B^^/^ > for all n ^ Bn, where 



is the column of the identity matrix corresponding to /i. From ( I61, we have Bj^' = B'^^P, and hence 
our inequality is —c^'^B'^^PI^ > 0. Since P is a permutation matrix, we know that PI^ is a column of 
the identity matrix, say 4(^1)- Given our costs in toI, and given the definition of S^, we have that c^^ 



is the vector ( 1 , • • • , 1 , 0, • • • ,0), and hence from ( 17 1 we get c^iS'^ ' 
conclusion, we have proven 



(0 



lx(m— n) -ilxn 



,llx"(AiT]) 



and we must prove that this number is greater than or equal to for all }JL ^B^. 



In 



(18) 



Whenever k{ii) <m — n, the result holds since ( 18 1 is 0. 

In case k{lJL) > m — n, we prove the result using the fact that these values depend only on the transi- 



tions chosen by T] . In fact, given the Mdp M and the scheduler T] , if we write ( 1 8 1 for the Markov chain 
M I T] (see Def . |3]l, we obtain 



-1 



(Mir]),i] 



-li>^«(AiT])-i/^ 



(19) 



for all /I Bi^t4ir]),ri- Note that for M 4, Tj there is no need to reorder (as there are no transitions in the 
basis) and so /X = k{lJL). Given that all the transitions M J, T] are chosen by 77, the basis B(MiT]),rj contains 
all the states and no transitions. In this equation, can be any column of /"^" (again, due to the fact that 
there are no transitions in the basis). 



to 



'"(AiT] 



'-k(il)-{m-n) 



Suppose, towards a contradiction, that ( 18 1 is less than for some k{ii) > m — n. This is equivalent 
-11 



< 0. By (19 1 we have -li^"(A | T])-i /, 



^/ < for some /x' in M | 77. 



Then, the solution induced by the basis is not dual feasible for the problem associated to M 4, T] . As 
there is at least one optimal basic and dual feasible solution (the one found by simplex method), there 
exists an optimal solution x*' such that the corresponding basis BF is not B(m|jj),77- As in M | T] there 
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exists only one basis containing all states (namely B{M|rj),7])' there exists s ^ B*-. In consequence, we 
have = 0. Since x*^ is optimal, by Theorem|Tj we obtain Pr^l^j (reach(?7)) = 0, from which (jsjl yields 
Pr^'' (reach (?/)) = 0. This contradicts the fact that Tj is apt. 

The proof for the case of the minimum is completely analogous: despite the differences in the con- 
straints and the cost vector, the reduced costs in ( fTS] ) are the same as before: 

These values again coincide with the ones in a system having only the transitions chosen by r/ . □ 

Proof ( of Theorem [2]). If the algorithm returns a value, then it is arg minx where ^ is the problem Q 
(or ^ for the minimum). Hence, by Theorem [TJ the returned value coincides with the output speci- 
fication. We have that if rj is apt, then Bf, is a basis by Lemma [5] As a consequence, the algorithm 
never enters the branch in line [TT] and so the result is returned. The termination in a single iteration 
is a consequence of the fact that the solution corresponding to an optimal scheduler rj is both feasible 
(Theorem|3]l and dual feasible (Theorem|4]|. □ 



4 Experimental results 

Implementation. We implemented our method by extending the model checker PRISM |fTO'|, using the 
Lp library glpk L2J- We compiled glpk using the library for arbitrary precision gmp. We needed to 
modify the code of glpk: although there is a solver function that uses exact arithmetic internally, this 
function does not allow us to retrieve the exact value. Aside from these changes to glpk and some 
additional code scattered around the PRISM code (in order to gather information about the scheduler), 
the specific code for implementing our method is less than 300 lines long. With these modifications. 
Prism is able to print the numerator and the denominator of the probabilities calculated. 

Our implementation works as follows: in the first step, we use the value iteration already imple- 
mented in Prism to calculate a candidate scheduler. In the next step, the Lp problem is constructed by 
iterating over each state: for each transition enabled, the corresponding probabilities are inserted in the 
matrix. The basis is constructed along this process: when a transition is considered, the description of 
the scheduler (implemented as an array) is queried about whether this transition is the one chosen by the 
scheduler. Next we solve the LP problem. For the reasons explained in Subsection 2.2 in Algorithm [T] 



we use the dual simplex method, except when we compare it to the primal one. The reader familiar with 
glpk might notice that the dual variant is not implemented under exact arithmetic on glpk: to overcome 
this, instead of providing glpk with the original problem, we provided the dual problem and retrieved 
the values of the dual variables (the dual problem is obtained by providing the transpose of the constraint 
matrix and by negating the cost coefficients, and so it does not affect the running time). 

The experiments were carried out on an Intel 17 @3.40Ghz with 8Gb RAM, running Windows 7. 

Case studies. We studied three known models available from the Prism benchmark suite llT4l . where 
the reader can look for matters not explained here (for instance, details about the parameters of each 
model). For the parameters whose values are not specified here, we use the default values. In the IEEE 
802.11 Wireless LAN model, two stations use a randomised exponential backoff rule to minimise the 
likelihood of transmission colhsion. The parameter N is the number of maximum backoffs. We compute 
the maximum probability that the backoff counters of both stations reach their maximum value. The 
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LP without Alg. 


1 




Algorithm|l 




Model 


Para- 
meters 


n= S" 


m 


Primal 


Dual 


Value 
iter. 


LP 

constr. 


Duan 

simplex 


Total 


Wlan 


3 


2529 


96302 


19.53 


11.76 


0.36 


0.05 


0.03 


0.44 


(N) 


4 


5781 


345000 


110.32 


61.83 


2.30 


0.21 


0.06 


2.57 




5 


12309 


1295218 


535.76 


326.64 


14.93 


1.32 


0.15 


16.40 


Consensus 


3,3 


3607 


3968 


251.74 


35.32 


2.93 


0.04 


0.15 


3.12 


(N,K) 


3,4 


4783 


5216 


488.84 


64.00 


6.47 


0.06 


0.58 


7.11 




3,5 


5959 


6464 


1085.70 


105.36 


12.74 


0.06 


1.87 


14.67 




4,1 


11450 


12416 




432.98 


2.88 


0.11 


0.19 


3.18 




4,2 


21690 


22656 




1951.91 


20.41 


0.23 


0.37 


21.01 




4,3 


31930 


32896 










59.73 


0.49 


0.58 


60.80 




4,4 


42170 


43136 










134.62 


0.64 


0.78 


136.04 




4,5 


52410 


53376 










246.90 


0.91 


0.96 


248.77 


Fiiewire 


200 


1071 


80980 


4.50 


2.65 


0.28 


0.04 


0.01 


0.33 


(D) 


300 


23782 


213805 




1314.32 


2.89 


1.04 


0.24 


4.17 




400 


81943 


434364 










11.05 


8.74 


O.i 


58 


20.67 



Table 1: Comparison of primal and dual simplex starting from a default basis against Algorithm [T] 



second model concerns the consensus algorithm for N processes of Aspnes & Herlihy |3|, which uses 
shared coins. We calculate the maximum probability that the protocol finishes without an agreement. 
The parameter K is used to bound a shared counter. Our third case study is the IEEE 1394 Fire Wire 
Root Contention Protocol (using the PRISM model which is based on |[T2ll ). We calculate the minimum 
probability that a leader is elected before a deadline of D time units. 

Linear programming versus Algoritiim [1} Table [T] allows us to compare (primal and dual) simplex 
starting from a default basis, against Algorithm [T] which provides a starting basis from a candidate 
scheduler. Aside from the construction of the Mdp from the PRISM language description (which is 
the same either using L? or Algorithm [T] and is thus disregarded in our comparisons), the steps in our 
implementation are: (1) perform value iteration to obtain a candidate scheduler; (2) construct the L? 
problem; (3) solve the problem in exact arithmetic in zero or more iterations (the latter is the case in 
which the scheduler is not optimal). All these times are shown in Table [TJ as well as its sum, expressed 
in seconds. The experiments for L? were run with a time-out of one hour (represented with a dash). 

Our method always outperforms the naive application of LP. The case with the lowest advantage is 
Consensus (3,5), and still our method takes less than 1/6 of the time required by dual simplex. 

With respect to the time devoted to exact arithmetic in Algorithm [TJ in all cases the simplex under 
exact arithmetic takes a fraction of the time spent by the other operations of the algorithm (namely, to 
perform value iteration and to construct the L? problem). In Consensus (3,5), the simplex algorithm 
takes less than 1/6 of the time devoted to the other operations. In all other cases the ratio is even lower. 

The greatest number found was 28821938103543398400, the denominator in the solution of Firewire 
400. It needs 65 bits to be stored. The computations were performed using 32 bit libraries, and so the 
exact arithmetic computations used around 3 words in the worst case (which is not really a challenge for 
an arbitrary precision library). We can conclude that, even for systems with more than 10000 states (up 
to 80000, in our experiments), the overhead introduced by exact arithmetic is manageable. 

Suboptimal sciiedulers as suboptimal bases. Other than measuring whether the calculation is reason- 
ably quick in case the scheduler from PRISM is optimal, a secondary measuring concerns how close is 
the basis to an optimal one in case the scheduler provided by PRISM is not optimal. 
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Primal 


Dual 


Iterations 


Time (seconds) 


Iterations 


Time (seconds) 




6 


7 


16 


6 


7 


16 


6 


7 


16 


6 


7 


16 


Consensus (3,3) 


187 


134 





3.43 


2.43 


0.10 


10 


6 





0.19 


0.15 


0.10 


Consensus (3,4) 


2497 


6278 





74.42 


202.58 


0.14 


37 


28 





0.63 


0.51 


0.13 


Consensus (3,5) 


4990 


4340 


1239 


190.53 


160.44 


49.487 


94 


61 


6 


1.93 


1.24 


0.25 



Table 2: Time spent when the starting basis is not optimal 



Except in cases Consensus (3,-), simplex stopped after iterations, thus indicating that PRISM was 
able to find the optimal scheduler. For optimal schedulers there is no difference between using primal 
or dual simplex in Algorithm [T] (we ran the experiments and the running time of the simplex variants 
differed by at most 0.05 seconds). 

The probabilities obtained in each step of the value iteration converge to those of an optimal sched- 
uler. Given a threshold e, value iteration stops only after \xs —x'^\ < £ for all s, where x and x' are the 
vectors obtained in the last two iterations. 

In Table |2] we compare the amount of iterations and the time spent by primal and dual simplex for 
schedulers obtained using different thresholds. We considered only the cases Consensus (3,-), as in other 
cases the scheduler returned by PRISM was optimum except for gross thresholds above 0.05, which are 
rarely used in practice (the default e in PRISM is 10^^). In addition to the default value, we considered 
representatives the value 10^' (since 10^^ already yields the exact solution for (3,3) in the dual case: 
a value smaller than 10^^ would have yielded uninteresting numbers for this case) and the value 10^^^, 
since in (3,5) the scheduler does not improve beyond such threshold. In fact, for 10^'^ the result is 
the same as for 10^^^^, and lO^^^'* is not a valid double. In Java, the type double corresponds to a 
IEEE 754 64-bit floating point. 

In consequence, we have one case (namely. Consensus (3,5)), where PRISM cannot find the worst- 
case scheduler for any double threshold (and thus should be recoded to use another arithmetic primitives 
to get exact results), while our method is able to calculate exact results using less than two seconds after 
value iteration, as shown in Table [T] 

For Consensus (3,-) we see that dual simplex performs betters than primal simplex. Consensus (3,4) 
shows that the primal simplex can behave worse when starting from Bf, than the dual simplex starting 
from the default basis (compare with the corresponding row in Table [TJ. Moreover, it can be the case that 
it takes more time as the threshold decreases (note that, in contrast, in Consensus (3,5) the time decreases 
with the threshold, as expected). This suggests that the dual variant should be preferred over the primal. 

Comparing against Table [TJ we see that, for each variant of the simplex method, starting from the 
basis results in a quicker calculation than starting from the default basis. 

It is worth mentioning that in all cases the difference between the probabilities provided by PRISM 
and the exact values was less than the threshold for value iteration. It indicates that the finite precision 
of the computations does not affect the results significantly. 

5 Discussion and further work 

Linear programming versus policy iteration. It is known that the dual simplex method applied for 
discounted Mdps is just the same as policy iteration (for an introduction to this method see |j9l) seen from 
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a different perspective. Indeed, this has been used to obtain complexity bounds (see |fT3]| ). Theorems [3] 
and|4]establish for undiscounted Mdps the same correspondence between basis and schedulers as known 
for the discounted case, and as a consequence the dual simplex is policy iteration disguised, also in the 
undiscounted case. 

Even without considering the results in this paper at all, exact solutions can also be calculated by 
implementing policy iteration with exact arithmetic as, in each iteration, the method calculates the prob- 
abilities corresponding to a scheduler and checks whether they can be improved by another scheduler. 
Roughly speaking, if the calculation and the check are performed using exact arithmetic, then the result 
is also exact. 

Despite this existing alternative, the correspondence between bases and schedulers we presented in 
this paper allows to obtain an exact solution by using LP solvers, thus profiting from all the knowledge 
concerning L? problems (and from existing implementations such as glpk). 

Complexity. To the best of our knowledge, the precise complexity of the simplex method in our case is 
unknown. There are recent results for the simplex applied to similar problems. For instance, in (13) it is 
proven that simplex is strongly polynomial for discounted Mdps. Nevertheless, [ 8 1 shows an exponential 
lower bound to calculate rewards in the undiscounted case. Unfortunately (or not, as there is still hope 
that we can prove the time to be polynomial in our case), the construction used in |[8l cannot be carried 
out easily to our setting, as some of the rewards in the construction are negative (and the equivalent to 
the rewards in our setting are the sums LfGf/M(0)- 

Further work. In the comparison of our method against LP, we considered only the simplex method, as 
glpk only implements this method in exact arithmetic. The feasibility/applicability of other algorithms 
to solve Lp problems using exact arithmetic is yet to be studied. 

The fact that the probabilities obtained are exact allows to prove additional facts about the system 
under consideration. For instance, the exact values can be used in correctness certificates, or be the input 
of automatic theorem provers, if they require exact values to prove some other properties of the system. 
We plan to concentrate on these uses of exact probabilities. 

Acknowledgements. The author is grateful to David Parker, Vojtech Forejt and Marta Kwiatkowska for 
useful comments and proofreading. 
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